In the spirit of open science, open data and reproducible science (Commission, Research, and Innovation 2021) we share the dataset (Bergami et al. 2022) and the statistical analysis used in the papers L’Astorina et al., 2023 DOI and Bergami et al., 2023 DOI for exploring environmental Citizen Science practices and scientists’ attitudes at ILTER, starting from the results of a global survey.
The exact wording of the questions and the relative possible answers are listed in the following table.
## import dataset
dataset <- readxl::read_excel("ILTER_PublicEngagement_forPapers.xlsx")
dataset$age <- as.numeric(format(Sys.Date(), "%Y")) - as.numeric(dataset$Q33)
rmarkdown::paged_table(dataset, options = list(rows.print = 15))
questions <- readxl::read_excel("wording.xlsx")
DT::datatable(
questions,
rownames= FALSE,
extensions = c('FixedColumns', "FixedHeader"),
options = list(
dom = 't',
ordering = F,
scrollY = '600px',
paging = FALSE,
scrollX = TRUE,
fixedHeader=TRUE,
fixedColumns = list(leftColumns = 1)
)
)
If you want reproduce the analysis performed, please visit the GitHub repo download the repository compressed file or clone it. But please …
… remember to cite this document (Oggioni and Bergami 2022) and the dataset (Bergami et al. 2022) if you want to use them for other publications or analysis.
The link to the survey was sent to all the ILTER site managers through the ILTER secretariat contact list (850 email recipients). The questionnaire remained open from the end of February to mid-September 2020 with two reminders sent within this period. In total, we received 163 responses; based on an estimated 850 participating scientists, our response rate is 17%.
First part of the survey, used in L’Astorina et al., 2023 DOI
Number of answers with a completeness >= 75 %:
poolP1 <- dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::count()
165
Second part of the survey, used in Bergami et al., 2023 DOI
Number of answers with a completeness >= 75 % of the columns from Q10 to Q30:
poolP2 <- dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
# dplyr::filter(Finished == 'True') %>% # = 75
dplyr::count()
77
The number of persons who accessed the survey (not necessarily finished it):
296
The response rate is:
respRate <- round((totalAnswers/850)*100, 2)
34.82 %
The number of persons who finished the survey (no information about the completeness):
142
The response rate considering answers with a completeness >= 50 %:
compl50rate <- round((poolP1/850)*100, 2)
19.41
The number of answer where the reference to the ILTER site, via DEIMS.ID, was NOT indicated:
201
The number of answer where the reference to the ILTER site, via DEIMS.ID, was indicated:
95
The number of answers with DEIMS.iD among them:
52
The number of answers with LTER network information among them:
77
The number of participants in CS initiative among respondents:
96
The number of participants in CS initiative among respondents with a completeness >= 50 %:
90
The number of CS initiatives declared among respondents with a completeness >= 50 %:
392
## Join with ILTER DEIMS GeoInfo
# Connect and download layers from LTER-Europe's GeoSever
fileName <- tempfile()
download.file("https://data.lter-europe.net/geoserver/deims/wfs?SERVICE=WFS&VERSION=1.0.0&REQUEST=GetFeature&TYPENAME=deims:ilter_all_formal&SRSNAME=EPSG:4326", fileName)
request <- rwfs::GMLFile$new(fileName)
client <- rwfs::WFSCachingClient$new(request)
ilter_all_formal <- client$getLayer("ilter_all_formal")
## Reading layer `ilter_all_formal' from data source
## `/private/var/folders/p1/110rx8q101z0wn0bwh4njrcw0000gn/T/RtmpyTrOO1/file1843219fc0842'
## using driver `GML'
## Simple feature collection with 755 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -156.5648 ymin: -78 xmax: 175.085 ymax: 79
## CRS: NA
sitesOnSurvey <- ilter_all_formal[ilter_all_formal$deimsid %in% dataset$Q30, ]
htmltools::div(
style = htmltools::css(width="100%", height='100%'),
leaflet::leaflet(sitesOnSurvey) %>%
leaflet::addTiles() %>%
# addMouseCoordinates() %>%
# leaflet::setView(lng = , lat = 23.16001, zoom = 1) %>%
leaflet::addMarkers(
clusterOptions = leaflet::markerClusterOptions(),
popup = paste0(
# "Name: <b>", sitesOnSurvey$name, "</b><br/>",
"DEIMS.ID: <b><a target = 'blank' href = '", sitesOnSurvey$deimsid, "'>", sitesOnSurvey$deimsid, "</a></b><br/>"
),
group = "Sites"
) %>%
leaflet::addLayersControl(position = 'bottomright',
overlayGroups = c(
"Sites"#,
# "Biome"
),
options = leaflet::layersControlOptions(collapsed = FALSE)
)
)
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q31) %>%
dplyr::mutate(Q31 = factor(Q31) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q31 = "Role in the site"
)
)
| Characteristic | N = 1651 |
|---|---|
| Role in the site | |
| Collaborator | 21 (13%) |
| Data manager | 9 (5.5%) |
| National Network coordinator | 11 (6.7%) |
| Other | 26 (16%) |
| Site manager | 62 (38%) |
| (Missing) | 36 (22%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q32) %>%
dplyr::mutate(Q32 = factor(Q32) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q32 = "Career level"
)
)
| Characteristic | N = 1651 |
|---|---|
| Career level | |
| Graduate student | 2 (1.2%) |
| Junior (for example, post-doc, assistant professor, entry-level researcher) | 17 (10%) |
| Mid-career (for example, associated professor, mid-level manager) | 37 (22%) |
| Other | 3 (1.8%) |
| Retired (including emeritus) | 3 (1.8%) |
| Senior (for example, professor, senior manager, administrator) | 72 (44%) |
| (Missing) | 31 (19%) |
| 1 n (%) | |
trial %>%
dplyr::select(response) %>%
# making the NA value explicit level of factor with `forcats::fct_explicit_na()`
dplyr::mutate(response = factor(response) %>%forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary()
| Characteristic | N = 2001 |
|---|---|
| response | |
| 0 | 132 (66%) |
| 1 | 61 (30%) |
| (Missing) | 7 (3.5%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(c(age, Q10, Q31:Q36)) %>%
dplyr::mutate(decade = floor(age/10)*10) %>%
dplyr::select(decade) %>%
dplyr::mutate(decade = factor(decade) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
decade = "Decade of birth"
)
)
| Characteristic | N = 1651 |
|---|---|
| Decade of birth | |
| 20 | 1 (0.6%) |
| 30 | 8 (4.8%) |
| 40 | 21 (13%) |
| 50 | 41 (25%) |
| 60 | 27 (16%) |
| 70 | 5 (3.0%) |
| (Missing) | 62 (38%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Regions) %>%
dplyr::mutate(Regions = factor(Regions) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary()
| Characteristic | N = 1651 |
|---|---|
| Regions | |
| Africa | 5 (3.0%) |
| Americas | 6 (3.6%) |
| East-Asia-Pacific (EAP) | 11 (6.7%) |
| LTER Europe | 96 (58%) |
| US LTER | 16 (9.7%) |
| (Missing) | 31 (19%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q34_1:Q34_6) %>%
tidyr::gather(questions, levelOfWillingness) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_1", "Collaborations with the public on scientific research (i.e., Citizen Science)", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_2", "Face-to-face science discussions and activities with the public", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_3", "Online science discussions and activities with the public", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_4", "Interviews with journalists or other media professionals about science", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_5", "Direct interactions with government policy makers about science", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q34_6", "Any form of public engagement with science involving children or young adults (18 years or younger)", questions)) %>%
dplyr::mutate(
levelOfWillingness = factor(
levelOfWillingness,
levels = c("Very\r\nunwilling", "Slightly \r\nunwilling", "Neither unwilling\r\nor willing", "Slightly \r\nwilling", "Very\r\nwilling")
) %>% forcats::fct_explicit_na()
) %>%
gtsummary::tbl_summary(
by = c(questions),
label = list(
levelOfWillingness = "Level of willingness"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | Any form of public engagement with science involving children or young adults (18 years or younger), N = 1651 | Collaborations with the public on scientific research (i.e., Citizen Science), N = 1651 | Direct interactions with government policy makers about science, N = 1651 | Face-to-face science discussions and activities with the public, N = 1651 | Interviews with journalists or other media professionals about science, N = 1651 | Online science discussions and activities with the public, N = 1651 |
|---|---|---|---|---|---|---|
| Level of willingness | ||||||
| Very unwilling | 8 (4.8%) | 11 (6.7%) | 9 (5.5%) | 6 (3.6%) | 11 (6.7%) | 9 (5.5%) |
| Slightly unwilling | 14 (8.5%) | 10 (6.1%) | 6 (3.6%) | 8 (4.8%) | 17 (10%) | 26 (16%) |
| Neither unwilling or willing | 13 (7.9%) | 16 (9.7%) | 13 (7.9%) | 15 (9.1%) | 18 (11%) | 23 (14%) |
| Slightly willing | 55 (33%) | 34 (21%) | 55 (33%) | 46 (28%) | 39 (24%) | 72 (44%) |
| Very willing | 75 (45%) | 94 (57%) | 82 (50%) | 88 (53%) | 80 (48%) | 34 (21%) |
| (Missing) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (1.2%) | 0 (0%) | 1 (0.6%) |
| 1 n (%) | ||||||
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q6_1:Q6_8) %>%
tidyr::gather(questions, reasons) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_5", "Educate the public on environmental issues", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_7", "Build relationships between scientists and the public who live and work near LTER Sites or LTSER Platforms", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_8", "Have greater influence on policy by collaborating with the public on scientific research", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_4", "Educate the public on how science research is conducted", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_6", "Bring in perspectives and ideas from the public that can inform scientific research", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_1", "Get help from the public by having them collect or classify data", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_3", "Make a grant proposal more competitive and appealing to funders by including citizen science", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q6_2", "Get help from the public in ways that are not limited to data collection and classification", questions)) %>%
dplyr::mutate(
reasons = factor(
reasons,
levels = c("Very low\r\nimportance", "Little\r\nimportance", "Moderate\r\nimportance", "High\r\nimportance", "Very high\r\nimportance")
) %>% forcats::fct_explicit_na()
) %>%
gtsummary::tbl_summary(
by = c(questions),
label = list(
reasons = "Reasons"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | Bring in perspectives and ideas from the public that can inform scientific research, N = 1651 | Build relationships between scientists and the public who live and work near LTER Sites or LTSER Platforms, N = 1651 | Educate the public on environmental issues, N = 1651 | Educate the public on how science research is conducted, N = 1651 | Get help from the public by having them collect or classify data, N = 1651 | Get help from the public in ways that are not limited to data collection and classification, N = 1651 | Have greater influence on policy by collaborating with the public on scientific research, N = 1651 | Make a grant proposal more competitive and appealing to funders by including citizen science, N = 1651 |
|---|---|---|---|---|---|---|---|---|
| Reasons | ||||||||
| Very low importance | 1 (0.6%) | 4 (2.4%) | 1 (0.6%) | 2 (1.2%) | 6 (3.6%) | 7 (4.2%) | 0 (0%) | 18 (11%) |
| Little importance | 15 (9.1%) | 6 (3.6%) | 1 (0.6%) | 13 (7.9%) | 31 (19%) | 28 (17%) | 13 (7.9%) | 24 (15%) |
| Moderate importance | 49 (30%) | 27 (16%) | 19 (12%) | 39 (24%) | 65 (39%) | 81 (49%) | 40 (24%) | 66 (40%) |
| High importance | 67 (41%) | 67 (41%) | 56 (34%) | 71 (43%) | 44 (27%) | 35 (21%) | 62 (38%) | 47 (28%) |
| Very high importance | 33 (20%) | 61 (37%) | 88 (53%) | 39 (24%) | 19 (12%) | 12 (7.3%) | 49 (30%) | 9 (5.5%) |
| (Missing) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (0.6%) | 0 (0%) | 2 (1.2%) | 1 (0.6%) | 1 (0.6%) |
| 1 n (%) | ||||||||
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q8_1:Q8_8) %>%
tidyr::gather(questions, barriers) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_7", "Scientists do not have any or enough support to start and run a citizen science project", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_5", "It is difficult to create long-term stable relationships with the public, which are necessary to conduct scientific research", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_8", "Scientists do not get credit or acknowledgement for their work in citizen science", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_3", "It is too difficult or time-consuming to validate data collected or classified by the public", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_1", "The public does not have the necessary knowledge or skills to contribute to scientific research", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_2", "It is too difficult or time-consuming to teach the public the necessary knowledge or skills to contribute to scientific research", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_6", "It is not possible to acknowledge citizen science volunteers’ contribution in grants, presentations, and publications", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q8_4", "The public is not interested in helping with science research", questions)) %>%
dplyr::mutate(
barriers = factor(
barriers,
levels = c("Strongly\r\ndisagree", "Disagree", "Neither agree\r\nnor disagree", "Agree", "Strongly\r\nagree")
) %>% forcats::fct_explicit_na()
) %>%
gtsummary::tbl_summary(
by = c(questions),
label = list(
barriers = "Barriers"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | It is difficult to create long-term stable relationships with the public, which are necessary to conduct scientific research, N = 1651 | It is not possible to acknowledge citizen science volunteers’ contribution in grants, presentations, and publications, N = 1651 | It is too difficult or time-consuming to teach the public the necessary knowledge or skills to contribute to scientific research, N = 1651 | It is too difficult or time-consuming to validate data collected or classified by the public, N = 1651 | Scientists do not get credit or acknowledgement for their work in citizen science, N = 1651 | Scientists do not have any or enough support to start and run a citizen science project, N = 1651 | The public does not have the necessary knowledge or skills to contribute to scientific research, N = 1651 | The public is not interested in helping with science research, N = 1651 |
|---|---|---|---|---|---|---|---|---|
| Barriers | ||||||||
| Strongly disagree | 7 (4.2%) | 21 (13%) | 13 (7.9%) | 7 (4.2%) | 10 (6.1%) | 5 (3.0%) | 15 (9.1%) | 32 (19%) |
| Disagree | 30 (18%) | 60 (36%) | 49 (30%) | 37 (22%) | 34 (21%) | 22 (13%) | 39 (24%) | 81 (49%) |
| Neither agree nor disagree | 38 (23%) | 56 (34%) | 49 (30%) | 51 (31%) | 51 (31%) | 39 (24%) | 53 (32%) | 39 (24%) |
| Agree | 68 (41%) | 24 (15%) | 38 (23%) | 57 (35%) | 42 (25%) | 76 (46%) | 47 (28%) | 7 (4.2%) |
| Strongly agree | 21 (13%) | 4 (2.4%) | 16 (9.7%) | 12 (7.3%) | 27 (16%) | 21 (13%) | 11 (6.7%) | 5 (3.0%) |
| (Missing) | 1 (0.6%) | 0 (0%) | 0 (0%) | 1 (0.6%) | 1 (0.6%) | 2 (1.2%) | 0 (0%) | 1 (0.6%) |
| 1 n (%) | ||||||||
dataset %>%
dplyr::filter(as.numeric(ProgressFirstPart) >= 75) %>%
dplyr::select(Q29_1:Q29_9) %>%
tidyr::gather(questions, impacts) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_2", "My involvement in citizen science has given me a better understanding of what the public thinks about scientists and the work they do.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_1", "My involvement in citizen science has given me insight into the concerns that the public has about science.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_4", "My involvement in citizen science has helped me improve how I communicate about my work with stakeholders.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_9", "My involvement in citizen science has helped me place my research in a broader context.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_3", "My involvement in citizen science has given me an opportunity to learn from the public in ways that are relevant to the work that I do.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_6", "My involvement in citizen science has helped me improve how I teach and mentor students and staff.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_5", "My involvement in citizen science has helped me improve how I communicate about my work with scientists outside my field.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_7", "My involvement in citizen science has influenced how I ask research questions.", questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q29_8", "My involvement in citizen science has influenced how I design studies, collect data, or analyze data.", questions)) %>%
dplyr::mutate(
impacts = factor(
impacts,
levels = c("Strongly\r\ndisagree", "Disagree", "Neither agree\r\nnor disagree", "Agree", "Strongly\r\nagree")
) %>% forcats::fct_explicit_na()
) %>%
gtsummary::tbl_summary(
by = c(questions),
label = list(
impacts = "Impacts"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | My involvement in citizen science has given me a better understanding of what the public thinks about scientists and the work they do., N = 1651 | My involvement in citizen science has given me an opportunity to learn from the public in ways that are relevant to the work that I do., N = 1651 | My involvement in citizen science has given me insight into the concerns that the public has about science., N = 1651 | My involvement in citizen science has helped me improve how I communicate about my work with scientists outside my field., N = 1651 | My involvement in citizen science has helped me improve how I communicate about my work with stakeholders., N = 1651 | My involvement in citizen science has helped me improve how I teach and mentor students and staff., N = 1651 | My involvement in citizen science has helped me place my research in a broader context., N = 1651 | My involvement in citizen science has influenced how I ask research questions., N = 1651 | My involvement in citizen science has influenced how I design studies, collect data, or analyze data., N = 1651 |
|---|---|---|---|---|---|---|---|---|---|
| Impacts | |||||||||
| Strongly disagree | 1 (0.6%) | 3 (1.8%) | 0 (0%) | 3 (1.8%) | 1 (0.6%) | 5 (3.0%) | 1 (0.6%) | 6 (3.6%) | 6 (3.6%) |
| Disagree | 4 (2.4%) | 5 (3.0%) | 4 (2.4%) | 9 (5.5%) | 7 (4.2%) | 3 (1.8%) | 9 (5.5%) | 16 (9.7%) | 17 (10%) |
| Neither agree nor disagree | 9 (5.5%) | 17 (10%) | 15 (9.1%) | 25 (15%) | 12 (7.3%) | 22 (13%) | 14 (8.5%) | 20 (12%) | 26 (16%) |
| Agree | 54 (33%) | 34 (21%) | 46 (28%) | 32 (19%) | 36 (22%) | 35 (21%) | 43 (26%) | 32 (19%) | 21 (13%) |
| Strongly agree | 13 (7.9%) | 22 (13%) | 16 (9.7%) | 12 (7.3%) | 24 (15%) | 13 (7.9%) | 13 (7.9%) | 7 (4.2%) | 11 (6.7%) |
| (Missing) | 84 (51%) | 84 (51%) | 84 (51%) | 84 (51%) | 85 (52%) | 87 (53%) | 85 (52%) | 84 (51%) | 84 (51%) |
| 1 n (%) | |||||||||
The percentage of participants in CS initiative among respondents with a completeness >= 50 % and the participants with a completeness >= 50 % of answers is:
partiRate <- round((participationVSResponses/poolP1)*100, 1)
54.5 %
The average of CS projects among the participants with a completeness >= 75 % in the second part of survey is:
4.6
participationCSDifference <- dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(c(Q10, Q31:Q36)) #%>% View()
participationCSDifference$age <- as.numeric(format(Sys.Date(), "%Y")) - participationCSDifference$Q33
# Q35 Gender
participationCSDifference %>%
dplyr::group_by(Q35) %>%
# dplyr::summarise(totalCSInitiative = sum(Q10)) %>%
dplyr::count(Q35) %>%
dplyr::filter(n > 1) %>%
ggplot2::ggplot(ggplot2::aes(x = Q35, y = n)) +
ggplot2::geom_bar(stat = "identity", fill = "orange") +
ggplot2::xlab("Gender") + ggplot2::ylab("Participants in CS initiatives") +
ggplot2::geom_text(ggplot2::aes(label = n), vjust = 1.6, color = "white", size = 3.5) +
ggplot2::theme_classic()
participationCSDifference %>%
dplyr::select(Q10, Q35) %>%
gtsummary::tbl_summary(
label = list(
Q10 = "CS projects declared by participant (Q10)",
Q35 = "Gender participant (Q35)"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | N = 761 |
|---|---|
| CS projects declared by participant (Q10) | 3.0 (2.0, 4.0) |
| Gender participant (Q35) | |
| Female | 29 (39%) |
| Male | 45 (61%) |
| Unknown | 2 |
| 1 Median (IQR); n (%) | |
participationCSDifference %>%
dplyr::select(Q10, Q35) %>%
gtsummary::tbl_summary(
by = Q35, # split table by group
missing = "no", # don't list missing data separately
statistic = list(all_continuous() ~ "{mean} ({sd})"),
label = list(Q10 = "CS projects declared (Q10)")
) %>%
gtsummary::add_n() %>% # add column with total number of non-missing observations
gtsummary::add_p() %>% # test for a difference between groups
gtsummary::modify_header(label = "**Variable**") %>% # update the column header
gtsummary::bold_labels()
| Variable | N | Female, N = 291 | Male, N = 451 | p-value2 |
|---|---|---|---|---|
| CS projects declared (Q10) | 74 | 5.5 (10.1) | 4.1 (4.7) | 0.8 |
| 1 Mean (SD) | ||||
| 2 Wilcoxon rank sum test | ||||
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q13) %>%
dplyr::mutate(Q13 = strsplit(as.character(Q13), ",")) %>%
tidyr::unnest(Q13) %>%
gtsummary::tbl_summary(
label = list(
Q13 = "GSpatial scale CS initiative"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 821 |
|---|---|
| GSpatial scale CS initiative | |
| International (at multiple national networks) | 10 (12%) |
| Local (at your site only) | 35 (43%) |
| National (at national network level) | 14 (17%) |
| Regional (at several sites in the same region) | 23 (28%) |
| 1 n (%) | |
# dataset %>%
# dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
# dplyr::filter(Q10 > 0) %>%
# dplyr::select(c(Q10, Q13)) %>% # 73
# dplyr::count(Q13) %>%
# dplyr::filter(n > 1) %>%
# ggplot2::ggplot(ggplot2::aes(x = Q13, y = n)) +
# ggplot2::geom_bar(stat = "identity", fill = "green4") +
# ggplot2::xlab("") + ggplot2::ylab("Number of projects") +
# ggplot2::scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 10)) +
# ggplot2::geom_text(ggplot2::aes(label = n), vjust = 1.6, color = "white", size = 3.5) +
# ggplot2::theme_classic()
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(c(Q10, Q13)) %>%
dplyr::group_by(Q13) %>%
dplyr::mutate(Q13 = strsplit(as.character(Q13), ",")) %>%
tidyr::unnest(Q13) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q13,
show.axis.values = FALSE,
axis.title = c("", "Number of projects")
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q56) %>%
gtsummary::tbl_summary(
label = list(
Q56 = "Projects duration (years)"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Projects duration (years) | 4.0 (2.0, 7.0) |
| Unknown | 1 |
| 1 Median (IQR) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q12) %>%
dplyr::mutate(Q12 = factor(Q12) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
type = list(Q12 ~ "categorical"),
label = list(
Q12 = "Projects active/concluded"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Projects active/concluded | |
| No | 23 (30%) |
| Yes | 52 (68%) |
| (Missing) | 1 (1.3%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(c(Q12, Q56)) %>%
dplyr::group_by(Q12, Q56) %>%
dplyr::summarise(freq = dplyr::n()) %>%
dplyr::filter(!is.na(Q12)) %>%
ggplot2::ggplot(ggplot2::aes(x = Q12, y = Q56)) +
ggplot2::geom_point(ggplot2::aes(size = freq), colour = "#1F78B4") +
ggplot2::xlab("Is the project still active?") +
ggplot2::ylab("Number of projects") +
ggplot2::labs(size = "Project duration (years)") +
ggplot2::scale_size_continuous(
breaks = c(2, 4, 6, 8),
labels = c('<=2', '4', '6', '=>8')
) +
ggplot2::theme_bw()
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q14) %>%
gtsummary::tbl_summary(
label = list(
Q14 = "Research focus"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Research focus | |
| Biology | 20 (26%) |
| Environmental science | 38 (50%) |
| Global change | 6 (7.9%) |
| Hydrology | 5 (6.6%) |
| Limnology | 2 (2.6%) |
| Management | 3 (3.9%) |
| Oceanography | 2 (2.6%) |
| 1 n (%) | |
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(c(Q10, Q14)) %>%
dplyr::group_by(Q14) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q14,
show.axis.values = FALSE
)
# Thanks to this guide http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know
strings_Q15 <- dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q15) %>%
dplyr::pull(Q15)
docs <- tm::Corpus(tm::VectorSource(strings_Q15))
# Cleaning the text
toSpace <- tm::content_transformer(function (x , pattern) gsub(pattern, " ", x))
docs <- tm::tm_map(docs, toSpace, "/")
docs <- tm::tm_map(docs, toSpace, "@")
docs <- tm::tm_map(docs, toSpace, "\\|")
# Convert the text to lower case
docs <- tm::tm_map(docs, tm::content_transformer(tolower))
# Remove numbers
docs <- tm::tm_map(docs, tm::removeNumbers)
# Remove english common stopwords
docs <- tm::tm_map(docs, tm::removeWords, tm::stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
# docs <- tm::tm_map(docs, tm::removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm::tm_map(docs, tm::removePunctuation)
# Eliminate extra white spaces
docs <- tm::tm_map(docs, tm::stripWhitespace)
# Text stemming
# docs <- tm::tm_map(docs, tm::stemDocument)
# Build a term-document matrix
dtm <- tm::TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
# Generate the Word cloud
set.seed(1234)
wordcloud::wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words = 200, random.order = FALSE, rot.per = 0.05,
colors = RColorBrewer::brewer.pal(8, "Dark2"))
# dataset %>%
# dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
# dplyr::filter(Q10 > 0) %>%
# dplyr::select(Q10, Q17, Q18:Q18_8_TEXT, Q19:Q19_4_TEXT) %>%
# dplyr::group_by(Q17) %>%
# dplyr::summarise(numProj = n()) %>%
# dplyr::mutate(Q17 = forcats::fct_relevel(Q17, "Fewer than 25", "25-50", "51-100", "101-500", "More than 500", "NA")) %>%
# ggplot2::ggplot(ggplot2::aes(x = Q17, y = numProj)) +
# ggplot2::xlab("n participants/year") + ggplot2::ylab("Number of projects") +
# ggplot2::geom_bar(stat = "identity", fill = "blue4") +
# ggplot2::geom_text(ggplot2::aes(label = numProj), vjust = 1.6, color = "white", size = 3.5) +
# ggplot2::theme_classic()
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q17) %>%
dplyr::mutate(Q17 = forcats::fct_relevel(Q17, "Fewer than 25", "25-50", "51-100", "101-500", "More than 500", "NA")) %>%
sjPlot::plot_frq(
Q17,
show.axis.values = FALSE,
axis.title = c("", "Number of projects")
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q17) %>%
dplyr::mutate(Q17 = factor(Q17) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q17 = "n participants/year"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| n participants/year | |
| 101-500 | 7 (9.2%) |
| 25-50 | 12 (16%) |
| 51-100 | 10 (13%) |
| Fewer than 25 | 39 (51%) |
| More than 500 | 8 (11%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q18) %>%
dplyr::mutate(Q18 = strsplit(as.character(Q18), ",")) %>%
tidyr::unnest(Q18) %>%
gtsummary::tbl_summary(
label = list(
Q18 = "Group types"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 1921 |
|---|---|
| Group types | |
| Adults who are in an organized group (for example birding club) | 32 (17%) |
| Adults who are not in an organized group | 49 (26%) |
| Children and young adults participating as part of a school program (18 years or younger) | 23 (12%) |
| Children and young adults participating through an "out-of-school" program (18 years or younger) | 12 (6.2%) |
| Families with adults and children/young adults | 15 (7.8%) |
| Other | 19 (9.9%) |
| Senior (over 65 years) | 21 (11%) |
| Undergraduate students | 21 (11%) |
| 1 n (%) | |
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q18) %>%
dplyr::mutate(Q18 = strsplit(as.character(Q18), ",")) %>%
tidyr::unnest(Q18) %>%
dplyr::group_by(Q18) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q18,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "red4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q19) %>%
dplyr::mutate(Q19 = strsplit(as.character(Q19), ",")) %>%
tidyr::unnest(Q19) %>%
dplyr::mutate(Q19 = factor(Q19) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q19 = "Group types"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 971 |
|---|---|
| Group types | |
| Other | 14 (14%) |
| Volunteers who are a minority group(s) in your region | 10 (10%) |
| Volunteers who have limited financial resources | 19 (20%) |
| Volunteers who live in rural areas | 19 (20%) |
| We do not target underserved community members. | 28 (29%) |
| (Missing) | 7 (7.2%) |
| 1 n (%) | |
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q19) %>%
dplyr::mutate(Q19 = strsplit(as.character(Q19), ",")) %>%
tidyr::unnest(Q19) %>%
dplyr::group_by(Q19) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q19,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "red4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q20) %>%
dplyr::mutate(Q20 = factor(Q20) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q20 = "Participation frequency"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Participation frequency | |
| Four to six times | 10 (13%) |
| More than six times | 17 (22%) |
| Once | 12 (16%) |
| Two to three times | 25 (33%) |
| (Missing) | 12 (16%) |
| 1 n (%) | |
sjPlot::set_theme(
base = ggplot2::theme_light(),
axis.tickslen = 0, # hides tick marks
axis.title.size = .9,
axis.textsize = .9,
geom.label.size = 3.5,
axis.title.y.vjust = 5
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q20) %>%
dplyr::mutate(Q20 = forcats::fct_relevel(Q20,
"Once", "Two to three times", "Four to six times", "More than six times")) %>%
dplyr::group_by(Q20) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q20,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "red4"
)
matrix7e <- dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q21_1:Q21_13) %>%
tidyr::gather(questions, typeOfInvol) %>% dplyr::group_by(questions, typeOfInvol) %>% dplyr::count() %>% dplyr::ungroup() %>% tidyr::spread(questions, n) %>%
t() %>%
data.frame(row.names(.), ., row.names = NULL) %>%
`colnames<-`(c('Activity in CS', 'High involvement', 'Moderate involvement', 'Not at all involved', 'Very high involvement', 'Very little involvement', 'NA')) %>%
.[-1,-1] %>% .[,-6] %>%
as.matrix() %>%
`rownames<-`(c(
'Help define research questions',
'Help interpret data and draw conclusions',
'Help disseminate conclusions',
'Help translate the results into action',
'Help discuss results and ask new questions',
'Help gather information and resources for research',
'Help develop hypotheses',
'Help design data collection methodologies',
'Help collect samples or record data',
'Help classify data',
'Help process samples',
'Help validate data',
'Help analyze data'
)) %>%
reshape2::melt()
matrix7e$Var2 <- factor(matrix7e$Var2, levels = c("Not at all involved", "Very little involvement", "Moderate involvement", "High involvement", "Very high involvement"))
matrix7e <- matrix7e[matrix7e$value!=0,]
matrix7e <- matrix7e[!is.na(matrix7e$value),]
ggplot2::ggplot(matrix7e, aes(x = Var2, y = Var1)) +
ggplot2::geom_raster(ggplot2::aes(fill = as.numeric(value))) +
ggplot2::scale_fill_gradient(low = "grey90", high = "red4", na.value = "grey10", guide = "colourbar") +
ggplot2::labs(x = "Degree of Involvement", y = "Type of Involvement") +
ggplot2::scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 10)) +
ggplot2::scale_y_discrete(labels = function(x) stringr::str_wrap(x, width = 30)) +
ggplot2::labs(fill = "n of answers") +
ggplot2::theme_classic() + ggplot2::theme(axis.text.x = element_text(size = 8, angle = 0, vjust = 0.3),
axis.text.y = element_text(size = 8),
plot.title = element_text(size = 11))
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q21_1:Q21_13) %>%
tidyr::gather(questions, involvements) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_1", 'Help define research questions', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_10", 'Help interpret data and draw conclusions', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_11", 'Help disseminate conclusions', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_12", 'Help translate the results into action', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_13", 'Help discuss results and ask new questions', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_2", 'Help gather information and resources for research', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_3", 'Help develop hypotheses', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_4", 'Help design data collection methodologies', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_5", 'Help collect samples or record data', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_6", 'Help classify data', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_7", 'Help process samples', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_8", 'Help validate data', questions)) %>%
dplyr::mutate(questions = ifelse(questions == "Q21_9", 'Help analyze data', questions)) %>%
dplyr::mutate(
involvements = factor(
involvements,
levels = c("Not at all\r\ninvolved", "Very little\r\ninvolvement", "Moderate\r\ninvolvement", "High\r\ninvolvement", "Very high\r\ninvolvement")
) %>% forcats::fct_explicit_na()
) %>%
gtsummary::tbl_summary(
by = c(questions),
label = list(
involvements = "Involvements"
)
) %>%
gtsummary::modify_header(label = "**Questions**")
| Questions | Help analyze data, N = 761 | Help classify data, N = 761 | Help collect samples or record data, N = 761 | Help define research questions, N = 761 | Help design data collection methodologies, N = 761 | Help develop hypotheses, N = 761 | Help discuss results and ask new questions, N = 761 | Help disseminate conclusions, N = 761 | Help gather information and resources for research, N = 761 | Help interpret data and draw conclusions, N = 761 | Help process samples, N = 761 | Help translate the results into action, N = 761 | Help validate data, N = 761 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Involvements | |||||||||||||
| Not at all involved | 44 (58%) | 13 (17%) | 4 (5.3%) | 28 (37%) | 24 (32%) | 27 (36%) | 11 (14%) | 19 (25%) | 13 (17%) | 28 (37%) | 35 (46%) | 20 (26%) | 24 (32%) |
| Very little involvement | 16 (21%) | 24 (32%) | 6 (7.9%) | 16 (21%) | 29 (38%) | 21 (28%) | 18 (24%) | 7 (9.2%) | 17 (22%) | 22 (29%) | 17 (22%) | 16 (21%) | 20 (26%) |
| Moderate involvement | 12 (16%) | 18 (24%) | 9 (12%) | 16 (21%) | 15 (20%) | 23 (30%) | 15 (20%) | 16 (21%) | 19 (25%) | 9 (12%) | 14 (18%) | 15 (20%) | 15 (20%) |
| High involvement | 4 (5.3%) | 17 (22%) | 29 (38%) | 11 (14%) | 8 (11%) | 3 (3.9%) | 17 (22%) | 23 (30%) | 16 (21%) | 15 (20%) | 3 (3.9%) | 22 (29%) | 14 (18%) |
| Very high involvement | 0 (0%) | 4 (5.3%) | 28 (37%) | 4 (5.3%) | 0 (0%) | 2 (2.6%) | 15 (20%) | 11 (14%) | 10 (13%) | 2 (2.6%) | 5 (6.6%) | 3 (3.9%) | 3 (3.9%) |
| (Missing) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (1.3%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (1.3%) | 0 (0%) | 2 (2.6%) | 0 (0%) | 0 (0%) |
| 1 n (%) | |||||||||||||
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q22) %>%
dplyr::mutate(Q22 = strsplit(as.character(Q22), ",")) %>%
tidyr::unnest(Q22) %>%
dplyr::mutate(Q22 = factor(Q22) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q22 = "Training methodologies"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 1341 |
|---|---|
| Training methodologies | |
| Mandatory face-to-face short workshop (1 day or less) | 10 (7.5%) |
| Mandatory multi-day training or certification course (more than 1 day) | 5 (3.7%) |
| Mandatory online tutorials | 1 (0.7%) |
| Other | 13 (9.7%) |
| Voluntary face-to-face short workshop (1 day or less) | 47 (35%) |
| Voluntary online tutorials | 9 (6.7%) |
| We do not provide any training or support for volunteers. | 4 (3.0%) |
| Written online or print instructions | 44 (33%) |
| (Missing) | 1 (0.7%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q22) %>%
dplyr::mutate(Q22 = strsplit(as.character(Q22), ",")) %>%
tidyr::unnest(Q22) %>%
dplyr::group_by(Q22) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q22,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "yellow4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q23) %>%
dplyr::mutate(Q23 = strsplit(as.character(Q23), ",")) %>%
tidyr::unnest(Q23) %>%
dplyr::mutate(Q23 = factor(Q23) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q23 = "Data type"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 2061 |
|---|---|
| Data type | |
| Boolean | 23 (11%) |
| Geographic coordinates | 42 (20%) |
| Images | 40 (19%) |
| Numeric | 53 (26%) |
| Other | 11 (5.3%) |
| Textual | 34 (17%) |
| We do not have volunteers collect data. | 2 (1.0%) |
| (Missing) | 1 (0.5%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q23) %>%
dplyr::mutate(Q23 = strsplit(as.character(Q23), ",")) %>%
tidyr::unnest(Q23) %>%
dplyr::group_by(Q23) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 3) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q23,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "violetred4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q25) %>%
dplyr::mutate(Q25 = strsplit(as.character(Q25), ",")) %>%
tidyr::unnest(Q25) %>%
dplyr::mutate(Q25 = factor(Q25) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q25 = "Quality check"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 1571 |
|---|---|
| Quality check | |
| Other | 7 (4.5%) |
| These data are checked by other volunteers | 16 (10%) |
| These data are checked by scientists | 63 (40%) |
| These data are checked using automated filters | 10 (6.4%) |
| These data are compared to data submitted by other volunteers or by scientists | 15 (9.6%) |
| These data are compared to volunteers' statements about their confidence in the quality of their submitted data | 9 (5.7%) |
| These data are confirmed through photos vouchers or samples that volunteers submitted with their data | 20 (13%) |
| These data are cross-checked for consistency with existing literature or other repositories | 13 (8.3%) |
| We do not check (validate) data collected or classified by volunteers. | 3 (1.9%) |
| (Missing) | 1 (0.6%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q25) %>%
dplyr::mutate(Q25 = strsplit(as.character(Q25), ",")) %>%
tidyr::unnest(Q25) %>%
dplyr::group_by(Q25) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q25,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "violetred4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q24) %>%
dplyr::mutate(Q24 = factor(Q24) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q24 = "Data sharing metodology"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Data sharing metodology | |
| Data are not uploaded to an online system. | 26 (34%) |
| Volunteers can access all data collected or classified by all volunteers and scientists. | 30 (39%) |
| Volunteers can access only the data that they collected or classified. | 10 (13%) |
| (Missing) | 10 (13%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q24) %>%
dplyr::group_by(Q24) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q24,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "chocolate4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q26) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "At in-person meetings"),
"At in-person meetings"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "At virtual meetings"),
"At virtual meetings"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through indirect online communications"),
"Through indirect online communications"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through direct online communications"),
"Through direct online communications"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through written newsletters or reports"),
"Through written newsletters or reports"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "We do not share findings with volunteers."),
"We do not share findings with volunteers."
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Other"),
"Other"
)
) %>%
dplyr::mutate(Q26 = factor(Q26) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q26 = "Ways to share findings"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 761 |
|---|---|
| Ways to share findings | |
| At in-person meetings | 38 (50%) |
| At virtual meetings | 2 (2.6%) |
| Other | 4 (5.3%) |
| Through direct online communications | 14 (18%) |
| Through indirect online communications | 11 (14%) |
| Through written newsletters or reports | 2 (2.6%) |
| We do not share findings with volunteers. | 4 (5.3%) |
| (Missing) | 1 (1.3%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q26) %>%
dplyr::group_by(Q26) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "At in-person meetings"),
"At in-person meetings"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "At virtual meetings"),
"At virtual meetings"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through indirect online communications"),
"Through indirect online communications"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through direct online communications"),
"Through direct online communications"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Through written newsletters or reports"),
"Through written newsletters or reports"
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "We do not share findings with volunteers."),
"We do not share findings with volunteers."
)
) %>%
dplyr::mutate(
Q26 = replace(
Q26,
stringr::str_starts(Q26, "Other"),
"Other"
)
) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::filter(freq > 1) %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q26,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "chocolate4"
)
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q27) %>%
dplyr::mutate(Q27 = strsplit(as.character(Q27), ",")) %>%
tidyr::unnest(Q27) %>%
dplyr::mutate(Q27 = factor(Q27) %>% forcats::fct_explicit_na()) %>%
gtsummary::tbl_summary(
label = list(
Q27 = "Quality check"
)
) %>%
gtsummary::modify_header(label = "**Responses**")
| Responses | N = 1061 |
|---|---|
| Quality check | |
| Other | 7 (6.6%) |
| Volunteers are acknowledged in the Acknowledgments section | 51 (48%) |
| Volunteers are listed as co-authors | 13 (12%) |
| Volunteers' contribution is described within the report or journal article (for example the Methods section) | 27 (25%) |
| We do not acknowledge volunteers' contribution. | 3 (2.8%) |
| (Missing) | 5 (4.7%) |
| 1 n (%) | |
dataset %>%
dplyr::filter(as.numeric(ProgressCS) >= 75) %>%
dplyr::filter(Q10 > 0) %>%
dplyr::select(Q10, Q27) %>%
dplyr::mutate(Q27 = strsplit(as.character(Q27), ",")) %>%
tidyr::unnest(Q27) %>%
dplyr::group_by(Q27) %>%
dplyr::mutate(freq = n()) %>%
dplyr::ungroup() %>%
dplyr::select(-freq) %>%
sjPlot::plot_frq(
Q27,
show.axis.values = FALSE,
axis.title = c("", "Number of projects"),
geom.colors = "chocolate4"
)
# listOfAllSites <- ReLTER::get_ilter_generalinfo()
# saveRDS(listOfAllSites, file = "ilter_sitesData.rds")
# listOfAllSites <- readRDS(file = "ilterSitesData.rds")
# remove the sites without geometry
# listOfAllSites <- listOfAllSites[c(1:1226, 1228:1237, 1239:1240, 1242:1243, 1248:1249), ]
#
# siteWithDeimsId <- dataset %>%
# dplyr::select(Q30) %>%
# .[-160,] %>%
# dplyr::filter(Q30 != "NA") %>%
# dplyr::add_row(Q30 = c(
# "https://deims.org/664177a4-a21a-4f59-9601-00909e275868",
# "https://deims.org/5a38fc08-5257-4b13-8465-1d50ea166b95",
# "https://deims.org/96ba6c55-a555-4e96-a3e6-14d6dfe8785b",
# "https://deims.org/923cb154-83c9-444d-817a-cde7879c09b5"
# )) %>%
# unique() # 84 DEIMS.iD
# sitesOnSurvey <- listOfAllSites[listOfAllSites$uri %in% siteWithDeimsId$Q30, ] # 84 sites compared with ILTER formal sites
# collect biogeographical region and biome from DEIMS site
# sitesOnSurveyEnvChar <- lapply(
# as.list(sitesOnSurvey$uri),
# FUN = function(x) {ReLTER::get_site_info(x, category = c("EnvCharacts"))}
# ) %>%
# dplyr::bind_rows() %>%
# dplyr::select(uri, envCharacteristics.biogeographicalRegion, envCharacteristics.biome)
# sitesOnSurvey_2 <- merge(x = sitesOnSurvey, y = sitesOnSurveyEnvChar, by.x = "uri", by.y = "uri", all = T)
# saveRDS(sitesOnSurvey_2, file = "sitesOnSurvey_2.rds")
sitesOnSurvey_2 <- readRDS(file = "sitesOnSurvey_2.rds")
biomeNum <- sitesOnSurvey_2$envCharacteristics.biome[-68] %>% unique() %>% length()
getPalette <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(12, "Set3"))
# Biome map plot
library("rnaturalearth")
library("rnaturalearthdata")
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
ggplot2::ggplot(data = world) +
ggplot2::geom_sf() +
ggplot2::xlab("Longitude") + ggplot2::ylab("Latitude") +
ggplot2::scale_y_continuous(limits = c(-90, 90), expand = c(0, 0)) +
ggplot2::scale_x_continuous(expand = c(0, 0)) +
ggplot2::geom_sf(
data = sitesOnSurvey_2$geometry[-68],
size = 1,
ggplot2::aes(
color = sitesOnSurvey_2$envCharacteristics.biome[-68]
),
) + # feature 68 missing the information in DEIMS_SDR about the Biome
ggplot2::scale_fill_manual(getPalette(biomeNum)) +
ggplot2::ggtitle("iLTER Sites on survey") +
ggplot2::scale_fill_discrete(name = "New Legend Title")
# Biogeographical Region map plot
nc <- sf::st_read("../TeaBagCatalogue/Maps_export/Zonobiome_poly.shp", quiet = TRUE)
ggplot2::ggplot() +
ggplot2::scale_y_continuous(limits = c(-90, 90), expand = c(0, 0)) +
ggplot2::scale_x_continuous(expand = c(0, 0)) +
ggplot2::geom_sf(data = nc, ggplot2::aes(fill = Legend), lwd = 0) +
ggplot2::geom_sf(data = sitesOnSurvey_2$geometry[-68], color = "black", size = 1) +
ggplot2::scale_fill_discrete(name = "Biogeographical Region")
We acknowledge the ILTER Coordination Committee and Secretariat for contributing to sharing widely the survey within the whole network and all the respondents of the ILTER community for their availability to participate.
All the analysis are performed with R language (R Core Team 2020).
This article is created by distill R package (Dervieux et al. 2022), the tables summary are made with gtsummary R package (Sjoberg et al. 2021), the plots with ggplot2 (Wickham 2016) and sjPlot (Lüdecke 2022), the maps are created by leaflet (Cheng, Karambelkar, and Xie 2022), leaflet.extras (Karambelkar and Schloerke 2018) and rnaturalearth (South 2022) R packages, while tidyr (Wickham and Girlich 2022), dplyr (Wickham et al. 2022), tm - Text Mining (Feinerer, Hornik, and Meyer 2008), wordcloud (Fellows 2018) and stringr (Wickham 2022) are used for manipulating the data for analysis. The ReLTER (Oggioni et al. 2022) was used to access information on ILTER sites in DEIMS-SDR (Wohner et al. 2019).
For attribution, please cite this work as
Oggioni & Bergami (2022, Oct. 10). Statistical analysis for exploring environmental Citizen Science practices and scientists' attitudes at ILTER. Retrieved from https://oggioniale.github.io/CSSurveyAnalysis/
BibTeX citation
@misc{oggioniCSI2022,
author = {Oggioni, Alessandro and Bergami, Caterina},
title = {Statistical analysis for exploring environmental Citizen Science practices and scientists' attitudes at ILTER},
url = {https://oggioniale.github.io/CSSurveyAnalysis/},
year = {2022}
}